UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data.

نویسندگان

Zhenjia Wang

Guojun Li

Robert W Robinson

Xiuzhen Huang

چکیده

Biclustering algorithms, which aim to provide an effective and efficient way to analyze gene expression data by finding a group of genes with trend-preserving expression patterns under certain conditions, have been widely developed since Morgan et al. pioneered a work about partitioning a data matrix into submatrices with approximately constant values. However, the identification of general trend-preserving biclusters which are the most meaningful substructures hidden in gene expression data remains a highly challenging problem. We found an elementary method by which biologically meaningful trend-preserving biclusters can be readily identified from noisy and complex large data. The basic idea is to apply the longest common subsequence (LCS) framework to selected pairs of rows in an index matrix derived from an input data matrix to locate a seed for each bicluster to be identified. We tested it on synthetic and real datasets and compared its performance with currently competitive biclustering tools. We found that the new algorithm, named UniBic, outperformed all previous biclustering algorithms in terms of commonly used evaluation scenarios except for BicSPAM on narrow biclusters. The latter was somewhat better at finding narrow biclusters, the task for which it was specifically designed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Cardinality based GRASP to the Biclustering of Gene Expression Data

Biclustering algorithms perform simultaneous row and column clustering of a given data matrix. In gene expression dataset a bicluster is a subset of genes that exhibit similar expression patterns through a subset of conditions. Biclustering is a useful data mining technique for identifying local patterns from gene expression data. In this paper biclusters are identified in two steps. In the fir...

متن کامل

Biclustering Gene Expressions Using Factor Graphs and the Max-Sum Algorithm

Biclustering is an intrinsically challenging and highly complex problem, particularly studied in the biology field, where the goal is to simultaneously cluster genes and samples of an expression data matrix. In this paper we present a novel approach to gene expression biclustering by providing a binary Factor Graph formulation to such problem. In more detail, we reformulate biclustering as a se...

متن کامل

Recent patents on biclustering algorithms for gene expression data analysis.

In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits a consistent pattern over a subset of conditions. Although used extensively in gene expression data analysis, conventional clustering alg...

متن کامل

Randomized Algorithmic Approach for Biclustering of Gene Expression Data

Microarray data processing revolves around the pivotal issue of locating genes altering their expression in response to pathogens, other organisms or other multiple environmental conditions resulted out of a comparison between infected and uninfected cells or tissues. To have a comprehensive analysis of the corollaries of certain treatments, deseases and developmental stages embodied as a data ...

متن کامل

A New Strategy of Geometrical Biclustering for Microarray Data Analysis

In this paper, we present a new biclustering algorithm to provide the geometrical interpretation of similar microarray gene expression profiles. Different from standard clustering analyses, biclustering methodology can perform simultaneous classification on the row and column dimensions of a data matrix. The main object of the strategy is to reveal the submatrix, in which a subset of genes exhi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Scientific reports

دوره 6 شماره

صفحات -

تاریخ انتشار 2016

UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data.

نویسندگان

چکیده

منابع مشابه

Application of Cardinality based GRASP to the Biclustering of Gene Expression Data

Biclustering Gene Expressions Using Factor Graphs and the Max-Sum Algorithm

Recent patents on biclustering algorithms for gene expression data analysis.

Randomized Algorithmic Approach for Biclustering of Gene Expression Data

A New Strategy of Geometrical Biclustering for Microarray Data Analysis

عنوان ژورنال:

اشتراک گذاری